Predicting EA Sports FIFA Team of the Season in Europe’s Five Leagues
The video game FIFA, which is developed by Electronic Arts (EA) Sports, has become the most popular sports video game in the world in recent years, largely due to its game mode Ultimate Team. The objective of Ultimate Team is to build the best team possible through both buying and selling players, as well as buying packs of cards similarly to how people buy soccer trading cards in real life. Each player receives ratings in various categories based on their real life abilities, and each of these ratings factor into their overall rating. At the end of each season, EA Sports creates a Team of the Season (TOTS), where they select the best player at each position in each league from that season based on how they performed in real life. The players who receive TOTS cards also receive a boost to their overall rating to reflect their abilities in real life. Although most of their choices for TOTS are understandable, there are some choices that confuse and sometimes anger fans. Along with this, EA has never explained how they make their choices. Through the use of machine learning methods and predictive modeling, we aim to determine which variables are most important when choosing a player for TOTS, as well as predict the Team of the Season for Europe’s top five leagues based on this season’s statistics.
Materials:
We retrieved complete player datasets for FIFA 17, FIFA 18, and FIFA 19 from here. We retrieved real life statistics from the 2016-2017, 2017-2018, and 2018-2019 seasons from fbref.com. We did not use data from the 2019-2020 season because COVID-19 caused each season to prematurely end in March of 2020.
Methods:
Using these data sets we went about predicting team of the season players using a Random Forest machine learning model. OTher models were tested, but we found that this method was the best. This makes many decision trees using the data to predict what players will be in the team of the season based upon the information that we feed into it. It then puts all of those trees together in order to make a decision on whether or not a player should be in the team of the season. We can then apply that model to data that it did not use in deciding how to decide whether or not a player is in the team of the season in order to check how good our model really is.
Revision: Whether the card is “Normal” or “Team of the Season (TOTS)”
Int : Interceptions
TklW : Tackles Won
OG : Own Goals
Pkcon : Penalties Conceded
MP: Matches Played
Min : Minutes
Gls : Goals
Ast: Assists
Non_Pk_G : Non Penalty Goals (Goals from Open Play or Free Kicks)
Pk: Penalty Kicks
Pkatt: Penalty Attempts
CrdY : Yellow Cards
CrdR : Red Cards
G_per90 : Goals per 90 minutes
A_per90 : Assists per 90 minutes
G_plus_A_per90 : Goals plus Assists per 90 minutes
G_minus_pk_per90 : Non Penalty Goals per 90 minutes
Rk : Table Position
GF : Goals For (Goals your team has scored)
GA : Goals Against (Goals your team has conceded)
GD : Goal Difference (GF-GA)
Pts : Team Points for the Season (3 for a win, 1 for a draw, 0 for a loss)
In this project, we will work with data from the five major global soccer leagues (Premier League, La Liga, Ligue 1, Bundesliga, and Serie A) to predict the Team of the Season (TOTS) for each league. Before digging into the individual evaluation of each league, we examined each of their averages to get an idea of their play styles. The table above exhibits that overall, the Premier League has the highest goal scoring enviroment, with the highest average number of both Goals and Assists. This could ultimately indicate that a higher threshold of offense will be required for a player at an offensive-oriented position to secure TOTS honors for the Premier League, as they are likely to face more fierce competition in terms of these counting stats. Meanwile, leagues like La Liga and Serie A also boast relatively offensive enviroments, while Bundesliga and Ligue 1 are more defensive environments based on their far more scarce frequency of Goals.
| League | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Premier League | 2.09 | 1.47 | 1.94 | 0.15 | 10.58 | 17.05 | 3.75 | 2.27 | 3.42 | 5.74 | 11.47 |
| La Liga | 2.05 | 1.41 | 1.85 | 0.20 | 10.60 | 16.80 | 3.89 | 2.04 | 3.46 | 5.82 | 10.62 |
| Ligue 1 | 1.95 | 1.30 | 1.75 | 0.21 | 10.49 | 16.94 | 3.67 | 1.95 | 3.17 | 5.77 | 11.09 |
| Bundesliga | 1.97 | 1.39 | 1.82 | 0.16 | 9.64 | 15.07 | 3.42 | 2.09 | 3.07 | 5.13 | 10.02 |
| Serie A | 2.05 | 1.35 | 1.85 | 0.19 | 10.61 | 16.51 | 3.73 | 2.06 | 3.32 | 5.77 | 11.07 |
The Premier League is widely considered the best league in the world. A league full of tradition and history that has seen many dominant teams and outstanding players. In recent history the league has been generally dominated by Manchester City and Liverpool, both of which won league titles by large margins. With the influx of foreign money in the league the talent gap between the top and the bottom of the league has seen steady growth, but those at the bottom continue to make it competitive.
Before diving into modeling, we first must explore the data to observe basic trends. First, we looked at the proportion of Premier League cards that are given the TOTS designation. Below, we see that a select few cards are given the TOTS designation.
We also wanted to look at goals scored by TOTS players versus normal players. In this density plot, we are able to see that TOTS players score significantly more goals than regular players.
We also found that final table position and player card status were highly correlated, specifically that players with TOTS cards generally played for teams that finished highly in the table. In the past three years, each team of the season has generally been filled with many of the top teams’ players, and the density plot below reflects this.
Players who receive TOTS cards are usually the most important players to their teams, and because of this, play more minutes per contest. The density plot below is evidence of this fact.
Finally, TOTS distribution is expected to be vary from league to league, so it is important to look at the distribution specific to the Premier League. In the Premier League, the position with the highest number of TOTS cards is striker.
Before modeling the data, we must split the data into training and testing sets. The training data is the data that we give to the model to learn from, while the testing data is what we use to test our model. It is important that the Key Performance Indicators (KPIs) are similar in each dataset, as this indicates that the model that has learned from the training data is correctly being applied to the testing data.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 3.055851 | 2.327128 | 2.781915 | 0.2739362 | 11.069149 | 26.94010 | 3.649960 | 2.360092 | 3.264401 | 5.455811 | 5.377800 |
| Normal | Testing | 3.200000 | 2.128000 | 2.872000 | 0.3280000 | 11.816000 | 27.45058 | 3.632292 | 2.094447 | 3.113384 | 5.492493 | 5.310808 |
| TOTS | Training | 8.942308 | 5.250000 | 8.269231 | 0.6730769 | 3.557692 | 31.76645 | 8.756876 | 4.167451 | 7.819321 | 3.268468 | 3.829581 |
| TOTS | Testing | 10.470588 | 7.352941 | 10.117647 | 0.3529412 | 4.058823 | 29.53987 | 8.768678 | 4.372373 | 8.388104 | 3.230143 | 4.999096 |
After seeing that our training and testing sets performed similarly, we created a random forest model to predict whether a player would be classified as TOTS or not. Our random forest model was made up of 100 decision trees. Each of these trees are uncorrelated, which helps provide stability and accuracy to the model. We also created a LASSO model, which filters out explanatory variables based on their importance to the outcome, for the training and testing data, however we found that the random forest was more accurate.
Using our random forest model, we were able to observe which variables were most important to our model. It appears that goals against each player’s team, minutes played, and matches played.
The confusion matrix below shows that 17 players were classified as TOTS. 10 of these players were correctly classified, while the model felt that 7 players who were not given TOTS cards should have been given one. It also felt that 7 players who were given TOTS cards should not have been given one.
Below are the players that our testing model incorrectly classified. Many of these players were either undervalued or overvalued based on the performance of their team. It is clear that the choices for TOTS are someone subjective.
Finally, we applied our model to the Premier League stats from the 2020-2021 season. The players who were chosen for TOTS are shown below.
La Liga has been dominated for many years by Barcelona and Real Madrid, two of the most storied clubs in the world. For the past decade it has been the story of Messi vs Ronaldo, best vs best. These two clubs have won the most Champions League trophies in the last decade and it is rare that one of them does not win the league. Outside of those two clubs the league somewhat struggles for talent, especially defensively, but the gap has seen some closing in the last few years.
The first exploratory plot we looked at was the number of TOTS vs Normal cards, and as you can see there are not many team of the season players in the data set.
Then we looked at goals scored by TOTS and normal players, and while both the densities are low, TOTS players tend to score more goals.
Next, we have a density plot of table position for team of the season players and normal player, TOTS player tend to finsish higher in the table.
Next, we have a density plot of minutes played for the team of the season players vs the normal players. Clearly the TOTS players play more.
This last exploratory plot shows us the distribution of the positions. As you can see there are not many center forwards, so those have been converted to strikers.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 2.817416 | 2.264045 | 2.539326 | 0.2780899 | 10.808989 | 26.28998 | 3.531401 | 2.079531 | 3.137438 | 5.438779 | 4.568289 |
| Normal | Testing | 3.355932 | 2.177966 | 3.050848 | 0.3050847 | 10.576271 | 26.12580 | 3.938687 | 2.205786 | 3.458787 | 5.441855 | 4.913905 |
| TOTS | Training | 10.333333 | 5.062500 | 8.833333 | 1.5000000 | 5.041667 | 29.98079 | 9.800854 | 3.628529 | 8.706401 | 4.980636 | 3.785520 |
| TOTS | Testing | 7.533333 | 4.533333 | 6.533333 | 1.0000000 | 3.266667 | 28.81852 | 10.232069 | 3.907258 | 9.287985 | 3.473711 | 3.411219 |
The plot below show the importance of the variables in the La Liga model, the most important stats are “Minutes Played” and “Team Goal Differential”.
The confusion matrix below shows us that we guessed 9 TOTS rights, 114 normal cards right, and 10 total wrong in the testing data for La Liga.
Here is our predicted team of the season for La Liga 20/21:
Generally considered the worst of the top 5 European leagues, Ligue 1 has been completely dominated by PSG for many years. Often called a “farmer’s league” and sometimes not even considered among the best leagues in the world. However, there is no doubt that PSG is one of the best teams in the world. With the likes of Mbappe and Neymar they managed to make it to the Champions League final last season and are in the semi-finals currently.
We began our modeling for Ligue 1 by joining the Ligue 1 datasets from 2017, 2018, and 2019.
We then began with exploratory plots. The first plot showed us how many players were given TOTS cards in the three combined datasets. We are able to see that once again only a small proportion of players are given TOTS cards.
Next, we looked at the density of goals scored between regular players and TOTS players. We were able to see that in general, a larger proportion of TOTS players score a higher number of goals.
Next, we looked at the density of table position by card type. We see that there is an even density of table position for normal cards, while the majority of TOTS players play for better teams.
We then looked at the density of minutes played per match and, unsurprisingly, players who are given TOTS cards tend to play more minutes per contest.
Finally, we looked at the distribution of TOTS cards by position. We are able to see that there is an overwhelming number of strikers and center backs in Ligue 1, and that players who play in the center of the field.
We also evaluated the metrics between the training and testing data to see if there was a significant difference between the two. For Ligue 1, there was not a significant difference in any of the important columns.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 2.874652 | 2.172702 | 2.532033 | 0.3426184 | 10.899721 | 26.99214 | 3.531724 | 2.076264 | 2.992253 | 5.400287 | 5.060310 |
| Normal | Testing | 2.613445 | 1.714286 | 2.411765 | 0.2016807 | 11.747899 | 26.66788 | 3.447203 | 1.790478 | 3.179414 | 5.642182 | 5.015616 |
| TOTS | Training | 8.958333 | 4.666667 | 7.583333 | 1.3750000 | 4.062500 | 28.75231 | 8.310358 | 3.610466 | 7.040410 | 4.107110 | 4.788361 |
| TOTS | Testing | 10.200000 | 4.533333 | 8.666667 | 1.5333333 | 2.666667 | 28.76444 | 10.516654 | 3.481926 | 8.829065 | 1.799471 | 4.595155 |
We then examined the accuracy rates of the different models in the different folds. The second model in the first fold is the most accurate at 94.3% accuracy.
In this model, the most important variables are minutes played, goal differential, and goals plus assists per 90 minutes. These three variables contribute to the card classification significantly more than the other variables.
After running the random forest model, our model accuracy comes out to about 86.56%. This is likely due to many players outperforming their card rank, as well as many teams outperforming their projections.
Overall, this model predicted that 18 players met our criteria to be selected for team of the season, while also misclassifying 15 players.
The misclassified players are shown below:
The model also shows that Jonathan Bamba, Idrissa Gana Gueye, Aurelien Tchouameni, Maxence Cqueret, and Ander Herrera are the top 5 midfielders in Ligue 1.
Considered the league of the people due to its rule of forcing every club to be 51% fan owned, the German Bundesliga is considered the second best defensive league behind the Premier League. Bayern Munich have dominated the league for many years, often poaching the best players from other teams in the league.
First, we looked at how many TOTS players there are vs normal players in our bundesliga data set. As you can see TOTS is a prestigious award not given to many players.
Then we looked at a density of goals scored. As we can see the top for both TOTS and not is fairly low, but the TOTS tend to score more.
Next, we have a density plot of the table position of TOTS vs normal players, and as with the other leagues the TOTS players tend to do better.
Next, we have a density plot of the minutes played of the normal cards vs TOTS cards and it is clear that the team of the season players play much more.
Lastly, we have a distribution of the positions and what positions got team of the seasons. As you can see there are not many wingers in the bundesliga, so they have been converted to left and right mids.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 2.907143 | 2.150000 | 2.682143 | 0.2250000 | 9.753571 | 25.36694 | 3.480404 | 2.017575 | 3.146774 | 4.968718 | 3.889732 |
| Normal | Testing | 2.387097 | 2.043011 | 2.172043 | 0.2150538 | 10.279570 | 24.69164 | 3.003738 | 2.004961 | 2.619424 | 4.583443 | 3.575467 |
| TOTS | Training | 9.583333 | 6.062500 | 8.583333 | 1.0000000 | 4.458333 | 27.66389 | 7.601325 | 3.732584 | 6.772389 | 3.439188 | 3.976717 |
| TOTS | Testing | 6.937500 | 4.750000 | 5.875000 | 1.0625000 | 4.937500 | 26.58889 | 7.196932 | 4.297286 | 5.806605 | 3.923752 | 4.262849 |
Below is the variable importance plot for the Bundesliga model. The most important variables are “Minutes Played”, “Non Penalty Goals”, “Goals Against (Team)” and “Non Penalty Goals plus Assists per 90 Minutes”.
This shows us how well we did in predicting on the testing data. We predicted 7 TOTS correctly, 85 normal correctly, and 17 total incorrectly.
Here are our predicted team of the season players in the Bundesliga:
The Serie A has one of the richest histories in Europe, with the likes of AC Milan, Inter Milan, and Juventus all having great success. However, in recent history the league has been completely dominated by Juventus with them winning 9 titles in a row before being stopped this year by Inter.
First we made a bar chart to see the number of team of the season players in the Serie A.
Next we made a density plot of goals. Team of the season players tend to score slightly more goals than normal players.
Then we made a density plot of team rank of the team of the season players vs normal players. We can see that the team of the season players finish much higher in the table.
Next we made a distribution plot of how much the team of the season players play vs normal players. As you can see the team of the seaon players tend to play a lot more.
We then made a plot of the positional breakdown of all the players. It seems that the distribution of the players is heavily in center backs, center mids, and strikers.
Next we made a table to compare important stats for the training and testing data.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 3.077994 | 2.222841 | 2.779944 | 0.2980501 | 11.000000 | 26.89774 | 3.696524 | 2.217360 | 3.290249 | 5.392422 | 4.849271 |
| Normal | Testing | 3.159664 | 2.521008 | 2.873950 | 0.2857143 | 9.974790 | 27.16788 | 3.347104 | 2.295348 | 3.147651 | 5.812701 | 4.795889 |
| TOTS | Training | 10.037736 | 5.377358 | 8.886793 | 1.1509434 | 4.603774 | 29.95765 | 8.257775 | 3.685869 | 7.135125 | 3.465987 | 4.914517 |
| TOTS | Testing | 10.352941 | 3.529412 | 9.294118 | 1.0588235 | 2.941177 | 28.31634 | 9.191988 | 2.741296 | 7.605629 | 2.045440 | 5.374223 |
Here is a plot of the most important variables in our Serie A model. It seems that “Minutes Played”, “Tackles Won”, and “Assists” seem to be the most important.
Here is a confusion matrix of the predictions and true values for the testing data. As you can see we predicted 9 team of the season players correctly and 10 incorrectly. While this is not great, it seems to be mostly ok because the predicted probabilities are seem to be ordered fairly well.
Here are the players in the testing data that our model predicted wrong. As you can see it is a wide variety of players, some being predicted wrong likely due to position, others due to team performance and others due to personal performance.
Here are the predicted team of the season players for the Serie A this year:
Here we show how Kevin De Bruyne would be modeled in all the different leagues had he played in them in order to demonstrate the similarities and differences between the models.
In all of the leagues he preforms fairly well, but we can see that some of the models have assists as a more important stat thus making him do better. And some of the leagues place more negative weight on the fact that he has played slightly less this season, etc.
In conclusion, we found that this is something that is very hard to predict. Our models in no way predicting the binary of TOTS or not properly, but they did seem to order the predicted probabilities fairly well. The best stats that our models seemed to use was how well the player’s team is doing and how much the player is playing. Obviously they used other stats fairly effectively as well, but they struggled to predict players that played well on worse teams. Thus these models likely couldn’t be used for much other than proving that much of what EA Sports does is subjective in terms of picking who gets these cards. Making these models confirmed our suspicion that they have no method to their madness. One interesting implication of this could be how getting or not getting one of these cards affects the public’s perception of the player. Are there players that should be more highly rated by soccer fans, but they didn’t get a team of the season so they aren’t (and vice versa).